Machine Learning in Python¶

This project employs Python to predict medical expenses using linear regression. By analyzing factors such as age, BMI, and various health conditions; a model was developed that accurately estimates healthcare costs. The results provide valuable insights for cost management and insurance planning.

Checking the Data Type for each Column¶

Data types for each column were analyzed to confirm data is in the correct format and to aid in selecting appropriate techniques for feature engineering.

The following provides a concise summary of the dataframe. A 'Non-Null Count' was returned for the data columns, indicating there are no missing values in the dataset. The summary also identifies 12 columns with data types of both Integers and Objects (i.e., categories). One can conclude the dataframe has data integrity.

In [79]:
medical_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 986 entries, 0 to 985
Data columns (total 12 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   Age                      986 non-null    int64 
 1   Diabetes                 986 non-null    object
 2   BloodPressureProblems    986 non-null    int64 
 3   AnyTransplants           986 non-null    object
 4   AnyChronicDiseases       986 non-null    object
 5   Height                   986 non-null    int64 
 6   Weight                   986 non-null    int64 
 7   KnownAllergies           986 non-null    int64 
 8   HistoryOfCancerInFamily  986 non-null    object
 9   NumberOfMajorSurgeries   986 non-null    object
 10  PremiumPrice             986 non-null    int64 
 11  BloodPressureProblem     986 non-null    object
dtypes: int64(6), object(6)
memory usage: 92.6+ KB

Checking the Statistics for each Column¶

Statistics for each column were analyzed to verify the range of data was consistent for each feature.

Analysis:

The overview shows patients have an age range of 18 to 66. Statistics for Age align with what is to be expected. This could also indicate the insurance company does not accept applications from people under the age of 18 and above the age of 66. A similar analysis was performed for the remaining features and all ranges were deemed appropriate.

In [12]:
medical_df.describe()
Out[12]:
Age Diabetes BloodPressureProblems AnyTransplants AnyChronicDiseases Height Weight KnownAllergies HistoryOfCancerInFamily NumberOfMajorSurgeries PremiumPrice
count 986.000000 986.000000 986.000000 986.000000 986.000000 986.000000 986.000000 986.000000 986.000000 986.000000 986.000000
mean 41.745436 0.419878 0.468560 0.055781 0.180527 168.182556 76.950304 0.215010 0.117647 0.667343 24336.713996
std 13.963371 0.493789 0.499264 0.229615 0.384821 10.098155 14.265096 0.411038 0.322353 0.749205 6248.184382
min 18.000000 0.000000 0.000000 0.000000 0.000000 145.000000 51.000000 0.000000 0.000000 0.000000 15000.000000
25% 30.000000 0.000000 0.000000 0.000000 0.000000 161.000000 67.000000 0.000000 0.000000 0.000000 21000.000000
50% 42.000000 0.000000 0.000000 0.000000 0.000000 168.000000 75.000000 0.000000 0.000000 1.000000 23000.000000
75% 53.000000 1.000000 1.000000 0.000000 0.000000 176.000000 87.000000 0.000000 0.000000 1.000000 28000.000000
max 66.000000 1.000000 1.000000 1.000000 1.000000 188.000000 132.000000 1.000000 1.000000 3.000000 40000.000000

Exploratory Analysis¶

Exploratory analysis was performed on the data by visualizing the distribution of various features and their relationships with PremiumPrice.

Analysis:

The distributions for each of the features were reflective of the population (i.e., all ranges for each feature were appropriate for the given classification).

In [5]:
medical_df.Age.describe()
# Visualizing the Age Category
fig = px.histogram(medical_df,
                   x='Age',
                   marginal='box',
                   nbins=48,
                   title='Distribution of Age')
fig.update_layout(bargap=0.1)
fig.show()
In [12]:
fig = px.histogram(medical_df,
                   x='Height',
                   marginal='box',
                   color_discrete_sequence=['red'],
                   title='Distribution of Height')
fig.update_layout(bargap=0.1)
fig.show()
In [15]:
fig = px.histogram(medical_df,
                   x='Weight',
                   marginal='box',
                   color_discrete_sequence=['purple'],
                   title='Distribution of Weight')
fig.update_layout(bargap=0.1)
fig.show()
In [6]:
fig = px.histogram(medical_df,
                   x='PremiumPrice',
                   marginal='box',
                   color='Diabetes',
                   color_discrete_sequence=['green', 'grey'],
                   title='Premium Prices')
fig.update_layout(bargap=0.1)
fig.show()
In [60]:
medical_df['BloodPressureProblem'] = medical_df['BloodPressureProblems'].astype('object')
fig = px.histogram(medical_df,
                   x='PremiumPrice',
                   marginal='box',
                   color='BloodPressureProblems',
                   color_discrete_sequence=['blue', 'grey'],
                   title='Premium Prices')
fig.update_layout(bargap=0.1)
fig.show()
In [28]:
fig = px.histogram(medical_df,
                   x='PremiumPrice',
                   marginal='box',
                   color='AnyTransplants',
                   color_discrete_sequence=['purple', 'grey'],
                   title='Premium Prices')
fig.update_layout(bargap=0.1)
fig.show()
In [29]:
fig = px.histogram(medical_df,
                   x='PremiumPrice',
                   marginal='box',
                   color='AnyChronicDiseases',
                   color_discrete_sequence=['orange', 'grey'],
                   title='Premium Prices')
fig.update_layout(bargap=0.1)
fig.show()
In [40]:
fig = px.histogram(medical_df,
                   x='PremiumPrice',
                   marginal='box',
                   color='KnownAllergies',
                   color_discrete_sequence=['red', 'grey'],
                   title='KnownAllergies')
fig.update_layout(bargap=0.1)
fig.show()
In [36]:
fig = px.histogram(medical_df,
                   x='PremiumPrice',
                   marginal='box',
                   color='HistoryOfCancerInFamily',
                   color_discrete_sequence=['yellow', 'grey'],
                   title='Premium Prices')
fig.update_layout(bargap=0.1)
fig.show()
In [46]:
fig = px.histogram(medical_df,
                   x='PremiumPrice',
                   marginal='box',
                   color='NumberOfMajorSurgeries',
                   color_discrete_sequence=['pink', 'green', 'blue', 'grey' ],
                   title='Premium Prices')
fig.update_layout(bargap=0.1)
fig.show()

Age vs. PremiumPrice Analysis¶

The relationship between Age and PremiumPrice was visualized using a scatterplot. Features in other columns were used to color code the plots.

Analysis:

The general trends for each of the visualizations seem to be that there is an increase in PremiumPrice with Age. There also seems to be numerous variations at every age (i.e., outliers). Even so, the relationship between Age and PremiumPrice displays a moderate correlation. Other values used to color code the plots do not seem to have any correlation with PremiumPrice.

In [75]:
fig = px.scatter(medical_df,
                   x='Age',
                   y='PremiumPrice',
                   color='AnyTransplants',
                   opacity=0.8,
                   title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()
In [62]:
medical_df['AnyChronicDiseases'] = medical_df['AnyChronicDiseases'].astype('object')
fig = px.scatter(medical_df,
                   x='Age',
                   y='PremiumPrice',
                   color='AnyChronicDiseases',
                   opacity=0.8,
                   title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()
In [63]:
medical_df['HistoryOfCancerInFamily'] = medical_df['HistoryOfCancerInFamily'].astype('object')
fig = px.scatter(medical_df,
                   x='Age',
                   y='PremiumPrice',
                   color='HistoryOfCancerInFamily',
                   opacity=0.8,
                   title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()
In [65]:
medical_df['NumberOfMajorSurgeries'] = medical_df['NumberOfMajorSurgeries'].astype('object')
fig = px.scatter(medical_df,
                   x='Age',
                   y='PremiumPrice',
                   color='NumberOfMajorSurgeries',
                   opacity=0.8,
                   title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()

Weight vs. PremiumPrice Analysis¶

The relationship between Weight and PremiumPrice was visualized using a scatterplot. Features in other columns were used to color code the plots.

Analysis:

There does not seem to be any trends and/or relationship between Weight and PremiumPrice when viewing each of the visualizations. There also seems to be numerous variations at every age (i.e., outliers). Other features used to color code the plots do not seem to have any correlation with PremiumPrice.

In [76]:
fig = px.scatter(medical_df,
                   x='Weight',
                   y='PremiumPrice',
                   color='AnyTransplants',
                   opacity=0.8,
                   title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()
In [67]:
medical_df['AnyChronicDiseases'] = medical_df['AnyChronicDiseases'].astype('object')
fig = px.scatter(medical_df,
                   x='Weight',
                   y='PremiumPrice',
                   color='AnyChronicDiseases',
                   opacity=0.8,
                   title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()
In [68]:
medical_df['HistoryOfCancerInFamily'] = medical_df['HistoryOfCancerInFamily'].astype('object')
fig = px.scatter(medical_df,
                   x='Weight',
                   y='PremiumPrice',
                   color='HistoryOfCancerInFamily',
                   opacity=0.8,
                   title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()
In [69]:
medical_df['NumberOfMajorSurgeries'] = medical_df['NumberOfMajorSurgeries'].astype('object')
fig = px.scatter(medical_df,
                   x='Weight',
                   y='PremiumPrice',
                   color='NumberOfMajorSurgeries',
                   opacity=0.8,
                   title='Age vs. PremiumPrice')
fig.update_traces(marker_size=5)
fig.show()

Correlation Analysis¶

As displayed in the scatter plots, Age is more closely related to PremiumPrice compared to other features (i.e., Weight). This relationship is often expressed numerically using a measure known as the correlation coefficient. Categorical columns were converted to numeric columns to compute the correlation.

Analysis:

The correlation coefficients confirmed our assumption that Age was the only feature that had a moderate to significant relationship with PremiumPrice (i.e., any correlation of -0.5 to -1, or 0.5 to 1). All other features had a weak or no relationship.

In [72]:
medical_df.PremiumPrice.corr(medical_df.Age)
Out[72]:
0.6975399655058031
In [73]:
medical_df.PremiumPrice.corr(medical_df.Weight)
Out[73]:
0.14150740525639752
In [80]:
medical_df['AnyTransplants'] = medical_df['AnyTransplants'].astype('int')
medical_df.PremiumPrice.corr(medical_df.AnyTransplants)
Out[80]:
0.2890559369634021
In [81]:
medical_df['AnyChronicDiseases'] = medical_df['AnyChronicDiseases'].astype('int')
medical_df.PremiumPrice.corr(medical_df.AnyTransplants)
Out[81]:
0.2890559369634021
In [82]:
medical_df['HistoryOfCancerInFamily'] = medical_df['HistoryOfCancerInFamily'].astype('int')
medical_df.PremiumPrice.corr(medical_df.HistoryOfCancerInFamily)
Out[82]:
0.08313941651638145
In [83]:
medical_df['NumberOfMajorSurgeries'] = medical_df['NumberOfMajorSurgeries'].astype('int')
medical_df.PremiumPrice.corr(medical_df.NumberOfMajorSurgeries)
Out[83]:
0.26424952935741713
In [84]:
medical_df.corr()
Out[84]:
Age BloodPressureProblems AnyTransplants AnyChronicDiseases Height Weight KnownAllergies HistoryOfCancerInFamily NumberOfMajorSurgeries PremiumPrice
Age 1.000000 0.244888 -0.008549 0.051072 0.039879 -0.018590 -0.024416 -0.027623 0.429181 0.697540
BloodPressureProblems 0.244888 1.000000 -0.024538 0.045424 -0.037926 -0.061016 -0.011550 0.048239 0.251568 0.167097
AnyTransplants -0.008549 -0.024538 1.000000 0.035285 -0.031543 0.002087 0.001876 -0.020171 -0.004154 0.289056
AnyChronicDiseases 0.051072 0.045424 0.035285 1.000000 0.047419 -0.033318 -0.027418 0.008666 0.014835 0.208610
Height 0.039879 -0.037926 -0.031543 0.047419 1.000000 0.066946 -0.010200 0.010549 0.037289 0.026910
Weight -0.018590 -0.061016 0.002087 -0.033318 0.066946 1.000000 0.037492 0.003481 -0.006108 0.141507
KnownAllergies -0.024416 -0.011550 0.001876 -0.027418 -0.010200 0.037492 1.000000 0.115383 0.103923 0.012103
HistoryOfCancerInFamily -0.027623 0.048239 -0.020171 0.008666 0.010549 0.003481 0.115383 1.000000 0.212657 0.083139
NumberOfMajorSurgeries 0.429181 0.251568 -0.004154 0.014835 0.037289 -0.006108 0.103923 0.212657 1.000000 0.264250
PremiumPrice 0.697540 0.167097 0.289056 0.208610 0.026910 0.141507 0.012103 0.083139 0.264250 1.000000
In [85]:
sns.heatmap(medical_df.corr(), cmap='Reds', annot=True)
plt.title('Correlation Matrix');

Linear Regression using a Single Feature¶

Based on our analysis, we know that Age and PremiumPrice have the strongest relationship. To further our analysis, a linear regression model was created to predict PremiumPrice ("target") using the following formula:

PremiumPrice = w * Age + b

Analysis:

A helper function was created to compute PremiumPrice, given Age, w and b. To get a clearer approximation of the slope and intercept (w and b), sample values were inserted into the formula and compared to the Age column in the dataset. Additionally, parameters were also visualized using scatter plots, which compared estimated charges to the actual dataset. Our analysis shows parameters of w and b should be close in range to the values 300 and 11,000.

In [86]:
plt.title('Age vs. PremiumPrice')
sns.scatterplot(data=medical_df, x='Age', y='PremiumPrice', alpha=0.7, s=15);
In [7]:
def estimate_premiums(Age, w, b):
    return w * Age + b
In [88]:
w = 50
b = 100
In [90]:
estimate_premiums(30, w, b)
Out[90]:
1600
In [93]:
medical_df.PremiumPrice
Out[93]:
0      25000
1      29000
2      23000
3      28000
4      23000
       ...  
981    15000
982    28000
983    29000
984    39000
985    15000
Name: PremiumPrice, Length: 986, dtype: int64
In [99]:
try_parameters(150, 15000)
In [105]:
try_parameters(300, 11000)

Loss/Cost Function¶

The model’s predictions can also be computed using the loss/cost function, which measures how well a model's predictions match the actual values in the dataset. The result is known as the root mean squared error ("RMSE"). The RMSE measures the average distance between the statistical model's predicted values and the actual values in the dataset.

Analysis:

Previous sample set weights of 300 and 11,000 were used to compute the RSME. The RSME result was 4551.25. Given the range of PremiumPrice being 25,000 (45,000 - 15,000), a result of 4551.25 is an acceptable RSME. This conclusion can be made because the dataset includes various outliers. However, given the moderate correlation of Age and PremiumPrice, a RSME percentage closer to 10 - 15 percent of the data range would be more suitable. The result currently stands at 18 percent (4551.25/25,000).

In [17]:
def rmse(targets, predictions):
    return np.sqrt(np.mean(np.square(targets - predictions)))
In [10]:
w = 300
b = 11000
In [11]:
targets = medical_df.PremiumPrice
predictions = estimate_premiums(medical_df.Age, w, b)
In [18]:
rmse(targets, predictions)
Out[18]:
4551.257544159166

Linear Regression using Scikit-learn¶

Scikit-learn was used to automate the machine learning process and assist with predicting PremiumPrice. In practice, the previous methods do not need to be implemented, but it is a good way to better understand the data and the analysis gathered can be compared to the final results to verify accuracy.

Analysis:

A new model object was created. Additionally, Scikit-learn requires "X" to be a 2-d array, so the dataframe was passed instead of a column. Next, the model was fitted to the data and PremiumPrice was predicted using the model. The RMSE for the model was also calculated, which had a result of 4474.83. This result is similar to the previous analysis, which had an RMSE of 4551.25. The parameters of the model are stored in the "coef" and "intercept" properties. A scatter plot was created to compare the model to the values in the actual dataset.

In [117]:
from sklearn.linear_model import LinearRegression
In [118]:
model = LinearRegression()
In [129]:
inputs = medical_df[['Age']]
targets = medical_df.PremiumPrice
print('inputs.shape :', inputs.shape)
print('targes.shape :', targets.shape)
inputs.shape : (986, 1)
targes.shape : (986,)
In [128]:
model.fit(inputs, targets)
Out[128]:
LinearRegression()
In [131]:
predictions = model.predict(inputs)
predictions
Out[131]:
array([25352.5543132 , 30034.47338215, 22543.40287183, 27537.44987871,
       23167.65874769, 20670.63524426, 21607.01905805, 18485.73967875,
       26288.93812699, 23167.65874769, 30034.47338215, 31907.24100972,
       18797.86761668, 25664.68225113, 16925.0999891 , 23167.65874769,
       24416.17049941, 23167.65874769, 29098.08956836, 17861.48380289,
       26601.06606492, 17549.35586496, 22231.2749339 , 22231.2749339 ,
       27849.57781664, 20982.76318219, 18173.61174082, 30034.47338215,
       20670.63524426, 21607.01905805, 18173.61174082, 19422.12349254,
       20046.3793684 , 19422.12349254, 31282.98513386, 26913.19400285,
       25040.42637527, 29410.21750629, 24728.29843734, 18797.86761668,
       17549.35586496, 31907.24100972, 19109.99555461, 27537.44987871,
       19422.12349254, 25040.42637527, 19109.99555461, 28161.70575457,
       31282.98513386, 17549.35586496, 20670.63524426, 30970.85719593,
       30346.60132008, 24728.29843734, 29410.21750629, 24104.04256148,
       30034.47338215, 26601.06606492, 18485.73967875, 25664.68225113,
       22543.40287183, 28473.8336925 , 18485.73967875, 20982.76318219,
       31282.98513386, 19422.12349254, 26601.06606492, 17237.22792703,
       23791.91462355, 21294.89112012, 19734.25143047, 28473.8336925 ,
       30970.85719593, 19109.99555461, 27849.57781664, 31907.24100972,
       18485.73967875, 22543.40287183, 26601.06606492, 20670.63524426,
       25040.42637527, 30970.85719593, 19734.25143047, 25664.68225113,
       25664.68225113, 20982.76318219, 25664.68225113, 17549.35586496,
       22231.2749339 , 21607.01905805, 25664.68225113, 30970.85719593,
       27537.44987871, 28161.70575457, 27849.57781664, 17861.48380289,
       21919.14699598, 20982.76318219, 27537.44987871, 24416.17049941,
       18797.86761668, 25352.5543132 , 29722.34544422, 17861.48380289,
       18485.73967875, 21607.01905805, 30034.47338215, 20982.76318219,
       20046.3793684 , 31282.98513386, 28473.8336925 , 22231.2749339 ,
       20358.50730633, 22231.2749339 , 25976.81018906, 19109.99555461,
       28473.8336925 , 29722.34544422, 31907.24100972, 21294.89112012,
       28161.70575457, 30970.85719593, 23167.65874769, 18485.73967875,
       21607.01905805, 20982.76318219, 24728.29843734, 23479.78668562,
       29722.34544422, 17237.22792703, 20046.3793684 , 29722.34544422,
       23167.65874769, 18485.73967875, 21919.14699598, 19734.25143047,
       22855.53080976, 31907.24100972, 27225.32194078, 26288.93812699,
       23791.91462355, 28161.70575457, 21607.01905805, 21607.01905805,
       29722.34544422, 27537.44987871, 22231.2749339 , 29722.34544422,
       27849.57781664, 26913.19400285, 28785.96163043, 25976.81018906,
       29098.08956836, 24416.17049941, 17861.48380289, 30658.729258  ,
       30658.729258  , 17549.35586496, 24728.29843734, 25664.68225113,
       29410.21750629, 17237.22792703, 21919.14699598, 27849.57781664,
       20358.50730633, 21294.89112012, 16925.0999891 , 20046.3793684 ,
       25040.42637527, 24416.17049941, 26288.93812699, 20670.63524426,
       20982.76318219, 25976.81018906, 18797.86761668, 28473.8336925 ,
       24416.17049941, 22231.2749339 , 21607.01905805, 31907.24100972,
       23167.65874769, 22855.53080976, 20670.63524426, 22543.40287183,
       20670.63524426, 25664.68225113, 17237.22792703, 19109.99555461,
       24416.17049941, 22231.2749339 , 20670.63524426, 26288.93812699,
       26913.19400285, 21607.01905805, 24728.29843734, 17237.22792703,
       23479.78668562, 25352.5543132 , 26288.93812699, 17549.35586496,
       23791.91462355, 21294.89112012, 20670.63524426, 18797.86761668,
       19734.25143047, 30034.47338215, 20982.76318219, 16925.0999891 ,
       21294.89112012, 26601.06606492, 24104.04256148, 20358.50730633,
       30034.47338215, 25664.68225113, 27537.44987871, 18173.61174082,
       31907.24100972, 31595.11307179, 30658.729258  , 22231.2749339 ,
       26913.19400285, 30346.60132008, 26601.06606492, 19109.99555461,
       22855.53080976, 29098.08956836, 30034.47338215, 25040.42637527,
       18173.61174082, 28785.96163043, 30970.85719593, 22543.40287183,
       27849.57781664, 20358.50730633, 21294.89112012, 16925.0999891 ,
       22231.2749339 , 21919.14699598, 30658.729258  , 28161.70575457,
       26601.06606492, 17861.48380289, 17549.35586496, 26601.06606492,
       25976.81018906, 28473.8336925 , 20358.50730633, 27537.44987871,
       25976.81018906, 17237.22792703, 19109.99555461, 31595.11307179,
       21294.89112012, 27849.57781664, 30034.47338215, 22855.53080976,
       17549.35586496, 18173.61174082, 29098.08956836, 21607.01905805,
       26288.93812699, 19109.99555461, 28161.70575457, 22543.40287183,
       18797.86761668, 18797.86761668, 28473.8336925 , 20670.63524426,
       23791.91462355, 26601.06606492, 25664.68225113, 30658.729258  ,
       25040.42637527, 22543.40287183, 31282.98513386, 26288.93812699,
       17549.35586496, 20046.3793684 , 25664.68225113, 27537.44987871,
       16925.0999891 , 22231.2749339 , 31282.98513386, 31907.24100972,
       26913.19400285, 26288.93812699, 31907.24100972, 27537.44987871,
       26288.93812699, 31282.98513386, 26601.06606492, 20358.50730633,
       24728.29843734, 25352.5543132 , 23791.91462355, 31282.98513386,
       23791.91462355, 21294.89112012, 27537.44987871, 22231.2749339 ,
       19422.12349254, 19109.99555461, 19109.99555461, 23791.91462355,
       19422.12349254, 24728.29843734, 28785.96163043, 23167.65874769,
       31595.11307179, 31282.98513386, 18485.73967875, 19109.99555461,
       22543.40287183, 23791.91462355, 29722.34544422, 21294.89112012,
       27537.44987871, 27849.57781664, 28473.8336925 , 24416.17049941,
       20046.3793684 , 21607.01905805, 21919.14699598, 18797.86761668,
       30346.60132008, 20358.50730633, 26288.93812699, 23167.65874769,
       27225.32194078, 30346.60132008, 20670.63524426, 29098.08956836,
       30970.85719593, 24728.29843734, 20982.76318219, 28473.8336925 ,
       17237.22792703, 26913.19400285, 25352.5543132 , 25976.81018906,
       25976.81018906, 24728.29843734, 26288.93812699, 19109.99555461,
       21919.14699598, 25976.81018906, 18173.61174082, 21919.14699598,
       30970.85719593, 27225.32194078, 26288.93812699, 21607.01905805,
       24416.17049941, 17237.22792703, 24416.17049941, 31595.11307179,
       25976.81018906, 24104.04256148, 25352.5543132 , 26601.06606492,
       24416.17049941, 21919.14699598, 22855.53080976, 24728.29843734,
       25664.68225113, 28161.70575457, 26288.93812699, 27537.44987871,
       22231.2749339 , 22543.40287183, 30658.729258  , 31282.98513386,
       19109.99555461, 30658.729258  , 20358.50730633, 22855.53080976,
       24728.29843734, 30346.60132008, 28161.70575457, 24104.04256148,
       30658.729258  , 24728.29843734, 17237.22792703, 23167.65874769,
       25352.5543132 , 28161.70575457, 20358.50730633, 30658.729258  ,
       28161.70575457, 25352.5543132 , 31282.98513386, 23791.91462355,
       22231.2749339 , 29410.21750629, 29098.08956836, 20046.3793684 ,
       21607.01905805, 21294.89112012, 31907.24100972, 24728.29843734,
       21294.89112012, 31595.11307179, 21919.14699598, 28473.8336925 ,
       31282.98513386, 25040.42637527, 22231.2749339 , 23167.65874769,
       25352.5543132 , 23479.78668562, 29722.34544422, 22543.40287183,
       30346.60132008, 21607.01905805, 25040.42637527, 20982.76318219,
       16925.0999891 , 25352.5543132 , 21919.14699598, 22855.53080976,
       20358.50730633, 22543.40287183, 18173.61174082, 17549.35586496,
       30346.60132008, 26601.06606492, 19734.25143047, 29722.34544422,
       18485.73967875, 17861.48380289, 29410.21750629, 22855.53080976,
       18797.86761668, 25040.42637527, 26913.19400285, 22855.53080976,
       20358.50730633, 22855.53080976, 18797.86761668, 31595.11307179,
       26288.93812699, 31282.98513386, 18173.61174082, 23479.78668562,
       17237.22792703, 16925.0999891 , 29722.34544422, 30034.47338215,
       19734.25143047, 26913.19400285, 17549.35586496, 23479.78668562,
       31282.98513386, 29410.21750629, 21294.89112012, 17237.22792703,
       17237.22792703, 18173.61174082, 30346.60132008, 29410.21750629,
       20358.50730633, 28785.96163043, 18797.86761668, 31595.11307179,
       18797.86761668, 26288.93812699, 24416.17049941, 25040.42637527,
       19422.12349254, 26601.06606492, 22231.2749339 , 29410.21750629,
       25976.81018906, 29098.08956836, 25664.68225113, 24104.04256148,
       16925.0999891 , 26601.06606492, 21294.89112012, 25352.5543132 ,
       19734.25143047, 28785.96163043, 24416.17049941, 31282.98513386,
       29410.21750629, 28785.96163043, 29410.21750629, 25352.5543132 ,
       25352.5543132 , 20670.63524426, 21607.01905805, 29722.34544422,
       19734.25143047, 20670.63524426, 31282.98513386, 21919.14699598,
       27225.32194078, 22231.2749339 , 30658.729258  , 25664.68225113,
       25352.5543132 , 19422.12349254, 26913.19400285, 31282.98513386,
       27537.44987871, 22543.40287183, 28161.70575457, 26601.06606492,
       27225.32194078, 31282.98513386, 17861.48380289, 22231.2749339 ,
       17237.22792703, 30970.85719593, 23479.78668562, 24728.29843734,
       25976.81018906, 30346.60132008, 24416.17049941, 17549.35586496,
       16925.0999891 , 26913.19400285, 27537.44987871, 22231.2749339 ,
       17861.48380289, 22855.53080976, 28161.70575457, 24416.17049941,
       27225.32194078, 20982.76318219, 20046.3793684 , 20046.3793684 ,
       26601.06606492, 25664.68225113, 23167.65874769, 25664.68225113,
       31595.11307179, 19109.99555461, 19109.99555461, 30658.729258  ,
       28161.70575457, 29722.34544422, 31282.98513386, 30970.85719593,
       30346.60132008, 25976.81018906, 24104.04256148, 29722.34544422,
       16925.0999891 , 20046.3793684 , 31907.24100972, 29410.21750629,
       29722.34544422, 31282.98513386, 30970.85719593, 22855.53080976,
       16925.0999891 , 20358.50730633, 24416.17049941, 18173.61174082,
       22543.40287183, 22855.53080976, 21294.89112012, 27225.32194078,
       23479.78668562, 30346.60132008, 19734.25143047, 28473.8336925 ,
       31907.24100972, 29410.21750629, 30034.47338215, 24728.29843734,
       28161.70575457, 17237.22792703, 31907.24100972, 27225.32194078,
       21919.14699598, 24728.29843734, 20670.63524426, 24104.04256148,
       16925.0999891 , 26601.06606492, 26913.19400285, 24728.29843734,
       19734.25143047, 24728.29843734, 20358.50730633, 20982.76318219,
       24416.17049941, 20982.76318219, 24104.04256148, 20982.76318219,
       25976.81018906, 19734.25143047, 25040.42637527, 29410.21750629,
       19734.25143047, 28473.8336925 , 27225.32194078, 18173.61174082,
       20046.3793684 , 25040.42637527, 26601.06606492, 25664.68225113,
       19422.12349254, 30658.729258  , 19734.25143047, 25352.5543132 ,
       30034.47338215, 30346.60132008, 19422.12349254, 29410.21750629,
       19734.25143047, 26913.19400285, 29098.08956836, 29722.34544422,
       24728.29843734, 17861.48380289, 24416.17049941, 29722.34544422,
       18797.86761668, 21294.89112012, 25352.5543132 , 24416.17049941,
       21919.14699598, 25352.5543132 , 20046.3793684 , 22543.40287183,
       18797.86761668, 24104.04256148, 24104.04256148, 26288.93812699,
       24104.04256148, 27225.32194078, 19734.25143047, 22231.2749339 ,
       23791.91462355, 24104.04256148, 17861.48380289, 25040.42637527,
       22543.40287183, 25352.5543132 , 25040.42637527, 27225.32194078,
       17549.35586496, 30970.85719593, 27225.32194078, 18173.61174082,
       30970.85719593, 20670.63524426, 23791.91462355, 29722.34544422,
       17861.48380289, 20670.63524426, 24728.29843734, 24416.17049941,
       30970.85719593, 23479.78668562, 30970.85719593, 31907.24100972,
       25040.42637527, 20046.3793684 , 23167.65874769, 27849.57781664,
       24728.29843734, 17861.48380289, 26913.19400285, 31907.24100972,
       19734.25143047, 26601.06606492, 22543.40287183, 23791.91462355,
       20982.76318219, 29722.34544422, 24728.29843734, 27225.32194078,
       18173.61174082, 30346.60132008, 31907.24100972, 31595.11307179,
       28161.70575457, 21919.14699598, 31907.24100972, 29722.34544422,
       18797.86761668, 27225.32194078, 25040.42637527, 26601.06606492,
       25664.68225113, 25976.81018906, 27849.57781664, 22231.2749339 ,
       17549.35586496, 26288.93812699, 31907.24100972, 24728.29843734,
       19734.25143047, 18173.61174082, 28785.96163043, 29722.34544422,
       30034.47338215, 26601.06606492, 20358.50730633, 24728.29843734,
       25664.68225113, 26913.19400285, 20358.50730633, 25976.81018906,
       28785.96163043, 21607.01905805, 24104.04256148, 23167.65874769,
       29722.34544422, 29098.08956836, 30658.729258  , 21294.89112012,
       28161.70575457, 26913.19400285, 28473.8336925 , 27537.44987871,
       27225.32194078, 22855.53080976, 16925.0999891 , 28785.96163043,
       23167.65874769, 26288.93812699, 30034.47338215, 25352.5543132 ,
       18485.73967875, 29098.08956836, 23479.78668562, 31595.11307179,
       26913.19400285, 28785.96163043, 19734.25143047, 26913.19400285,
       31907.24100972, 27849.57781664, 19422.12349254, 19109.99555461,
       19734.25143047, 17237.22792703, 28785.96163043, 21607.01905805,
       17237.22792703, 18797.86761668, 18797.86761668, 22855.53080976,
       24416.17049941, 17549.35586496, 30970.85719593, 17861.48380289,
       31595.11307179, 20670.63524426, 21294.89112012, 27849.57781664,
       31282.98513386, 25040.42637527, 25664.68225113, 19734.25143047,
       24728.29843734, 18485.73967875, 17237.22792703, 24728.29843734,
       16925.0999891 , 19109.99555461, 29722.34544422, 29722.34544422,
       20670.63524426, 29722.34544422, 27849.57781664, 23791.91462355,
       24104.04256148, 26601.06606492, 25976.81018906, 31595.11307179,
       24416.17049941, 24416.17049941, 24104.04256148, 25352.5543132 ,
       21919.14699598, 21294.89112012, 18797.86761668, 30970.85719593,
       28785.96163043, 30034.47338215, 19734.25143047, 19734.25143047,
       20670.63524426, 25976.81018906, 22543.40287183, 29410.21750629,
       17549.35586496, 21294.89112012, 21607.01905805, 16925.0999891 ,
       20358.50730633, 21919.14699598, 29098.08956836, 25040.42637527,
       21607.01905805, 18797.86761668, 18173.61174082, 22543.40287183,
       25040.42637527, 20358.50730633, 19734.25143047, 30658.729258  ,
       21607.01905805, 26288.93812699, 19109.99555461, 22231.2749339 ,
       30658.729258  , 21607.01905805, 25976.81018906, 26288.93812699,
       24416.17049941, 31595.11307179, 31595.11307179, 26913.19400285,
       23791.91462355, 31595.11307179, 16925.0999891 , 30658.729258  ,
       22543.40287183, 21294.89112012, 27537.44987871, 19734.25143047,
       31595.11307179, 20982.76318219, 27537.44987871, 19422.12349254,
       23791.91462355, 19109.99555461, 17861.48380289, 17861.48380289,
       20358.50730633, 26913.19400285, 25040.42637527, 28785.96163043,
       26288.93812699, 28785.96163043, 26913.19400285, 30034.47338215,
       16925.0999891 , 24104.04256148, 22231.2749339 , 18797.86761668,
       18173.61174082, 17549.35586496, 26288.93812699, 16925.0999891 ,
       30346.60132008, 22855.53080976, 28161.70575457, 20358.50730633,
       17861.48380289, 28473.8336925 , 31595.11307179, 20046.3793684 ,
       30346.60132008, 25040.42637527, 27849.57781664, 19109.99555461,
       21607.01905805, 31907.24100972, 20046.3793684 , 30658.729258  ,
       19109.99555461, 16925.0999891 , 23479.78668562, 20046.3793684 ,
       24416.17049941, 22231.2749339 , 27225.32194078, 26288.93812699,
       27225.32194078, 18485.73967875, 27849.57781664, 20982.76318219,
       25040.42637527, 27225.32194078, 22855.53080976, 18173.61174082,
       25664.68225113, 18173.61174082, 27537.44987871, 21294.89112012,
       25352.5543132 , 24728.29843734, 29410.21750629, 24416.17049941,
       30658.729258  , 25352.5543132 , 25976.81018906, 19734.25143047,
       16925.0999891 , 28473.8336925 , 23167.65874769, 22855.53080976,
       27225.32194078, 17237.22792703, 21919.14699598, 29098.08956836,
       21919.14699598, 23479.78668562, 28473.8336925 , 20670.63524426,
       30658.729258  , 19109.99555461, 17237.22792703, 27225.32194078,
       20670.63524426, 23791.91462355, 17861.48380289, 28161.70575457,
       30970.85719593, 31907.24100972, 24728.29843734, 25976.81018906,
       20358.50730633, 27849.57781664, 21607.01905805, 17861.48380289,
       30658.729258  , 24104.04256148, 18797.86761668, 20982.76318219,
       17237.22792703, 18173.61174082, 24104.04256148, 30346.60132008,
       24416.17049941, 28161.70575457, 28161.70575457, 23167.65874769,
       17237.22792703, 26288.93812699, 20982.76318219, 24728.29843734,
       23791.91462355, 22231.2749339 , 19109.99555461, 27849.57781664,
       24728.29843734, 28473.8336925 , 25352.5543132 , 20358.50730633,
       30034.47338215, 20670.63524426, 27537.44987871, 19109.99555461,
       25040.42637527, 30658.729258  , 30970.85719593, 17861.48380289,
       25664.68225113, 19734.25143047, 16925.0999891 , 31907.24100972,
       18485.73967875, 29722.34544422, 21294.89112012, 26601.06606492,
       19734.25143047, 22231.2749339 , 31907.24100972, 24416.17049941,
       16925.0999891 , 25352.5543132 , 25664.68225113, 19422.12349254,
       20982.76318219, 20046.3793684 , 25976.81018906, 25040.42637527,
       17861.48380289, 25352.5543132 , 23791.91462355, 18797.86761668,
       23791.91462355, 16925.0999891 , 31282.98513386, 28785.96163043,
       25976.81018906, 17861.48380289])
In [132]:
rmse(targets, predictions)
Out[132]:
4474.83985341589
In [133]:
model.coef_
Out[133]:
array([312.12793793])
In [134]:
# b
model.intercept_
Out[134]:
11306.797106367303
In [135]:
try_parameters(model.coef_, model.intercept_)

Linear Regression using Multiple Features¶

Based on our analysis, we know that Age and PremiumPrice have the strongest relationship. A second feature ("Weight") was added to the model to possibly reduce the loss. The formula is as follows:

PremiumPrice = w1 Age + w2 Weight + b

Analysis:

The RMSE with Weight included has a result of 4369.57. This is slightly less than the single feature model, which had a RMSE of 4474.83. As expected Weight does not significantly lower the costs due to a weak correlation of 0.14.

In [136]:
inputs = medical_df[['Age', 'Weight']]
targets = medical_df.PremiumPrice
print('inputs.shape :', inputs.shape)
print('targes.shape :', targets.shape)
inputs.shape : (986, 2)
targes.shape : (986,)
In [137]:
model.fit(inputs, targets)
Out[137]:
LinearRegression()
In [138]:
predictions = model.predict(inputs)
predictions
Out[138]:
array([24006.41690067, 29790.56580777, 21321.06467302, 28636.94320625,
       23910.73428592, 20117.42655092, 20042.40327006, 18600.37501815,
       26097.28911393, 24249.15545688, 29858.25004196, 31264.94086664,
       17424.73527661, 27027.199679  , 16830.25526223, 22557.04960209,
       23743.01901057, 23504.62888077, 28038.11476547, 19191.86441226,
       27967.43991101, 17795.50325453, 23241.23099068, 22632.07288295,
       29221.09355368, 19618.62915129, 19505.27782293, 30805.82932065,
       20388.16348769, 20313.14020682, 19437.59358874, 20352.82606046,
       19422.91549539, 20149.77335788, 31856.43026075, 27062.53710623,
       23557.63502161, 28351.52817613, 25004.01169992, 19725.99923912,
       16915.60821004, 33227.78365819, 17602.78021889, 27892.41663014,
       18051.56209795, 25114.37240802, 18550.35949758, 29466.82273016,
       29893.58746919, 17930.87172291, 21065.0058296 , 31001.54297655,
       29088.71570557, 25680.85404184, 28216.15970775, 23158.86866314,
       29925.93427616, 25395.43901173, 18600.37501815, 25335.09382421,
       22065.59124913, 27411.28794413, 18668.05925234, 21039.99806932,
       30367.37710853, 19540.61525016, 27967.43991101, 16060.72092583,
       22168.61291055, 19932.04256196, 18094.23857185, 29644.86767245,
       30392.38486882, 18482.67526338, 26310.67148345, 31806.41474017,
       18938.79618911, 22336.3281859 , 25057.01784077, 21335.74276637,
       24166.79312934, 29444.80559014, 18500.343977  , 25538.14652678,
       24252.14607714, 21243.05077189, 26824.14697642, 18472.34559644,
       21549.12513589, 20313.14020682, 24590.5672481 , 32219.85919199,
       27486.31122499, 27436.29570441, 27867.40886985, 19056.49594388,
       21709.50136456, 21175.3665377 , 26606.4161805 , 25773.54603632,
       20064.42041008, 26307.68086318, 30763.15284674, 16416.81081041,
       19751.00699941, 20989.98254874, 28775.3022949 , 22258.31428476,
       19896.70513473, 29690.53476662, 29441.81496987, 20939.96702816,
       20142.43431121, 21210.70396493, 25919.24417164, 18618.04373177,
       29103.39379891, 28732.625821  , 30452.73005634, 20067.41103035,
       27300.92723603, 30663.12180559, 22827.78653885, 19344.90159426,
       22072.9302958 , 22190.63005057, 24665.59052897, 22058.25220246,
       29544.83663129, 16196.08939422, 18407.65198252, 30153.99473902,
       24384.52392526, 19751.00699941, 22860.13334581, 19989.39712921,
       23461.95240686, 31806.41474017, 27849.74015624, 25352.76253782,
       25011.3507466 , 26962.50606507, 20854.61408036, 20922.29831455,
       30018.62627064, 27418.6269908 , 21413.7566675 , 29138.73122614,
       28002.77733824, 25302.74701725, 28469.2279309 , 28626.61353931,
       28647.27287319, 25299.75639698, 20613.23333028, 32921.7092942 ,
       30146.65569234, 17457.08208357, 27711.38106759, 25132.04112163,
       32074.16105667, 17279.03714128, 23333.92298516, 27596.67193309,
       20277.80277959, 23113.20156897, 16897.93949642, 20844.28441341,
       24775.95123706, 24622.91405506, 27315.60532938, 20861.95312703,
       21107.68230351, 26393.03381098, 18507.68302367, 31133.92082466,
       24013.75594734, 24188.81026936, 21396.08795389, 32009.46744275,
       22557.04960209, 24341.84745135, 21268.05853218, 25382.11872452,
       22689.4274502 , 26621.09427385, 20730.93308505, 21257.72886524,
       25502.80909955, 23782.70486421, 21606.47970314, 28466.23731063,
       30311.38034742, 23629.66768221, 25207.0644025 , 20933.98578763,
       26390.04319072, 26239.99662899, 27857.07920291, 18133.92442548,
       24740.61380983, 21624.14841675, 21944.90087409, 22230.31590421,
       22629.08226268, 32091.82977028, 23476.63050021, 20011.41426923,
       22368.67499286, 28847.3349555 , 25324.76415727, 20548.53971636,
       31482.67166256, 27974.77895768, 28298.52203529, 19099.17241778,
       33701.57329754, 34944.89727327, 33666.23587031, 24053.44180097,
       28619.27449263, 33149.76975706, 28508.91378454, 20107.09688398,
       23935.74204621, 32572.9584563 , 31347.30319418, 28227.84718083,
       21197.38367772, 29484.49144378, 29715.54252691, 22336.3281859 ,
       27596.67193309, 19127.17079834, 20135.09526454, 16491.83409127,
       23308.91522487, 21980.23830132, 31500.34037618, 29263.77002759,
       27832.07144262, 18311.96936777, 16238.76586812, 27358.28180328,
       26596.08651356, 27208.23524155, 19939.38160864, 27283.25852242,
       25107.03336135, 17888.19524901, 19971.7284156 , 30680.7905192 ,
       22368.67499286, 29221.09355368, 30873.51355484, 22852.79429914,
       17389.39784938, 16730.22422108, 30407.06296217, 22208.29876419,
       26841.81569004, 18008.88562404, 27165.55876765, 23757.69710392,
       18846.10419463, 19252.20959978, 27749.70911508, 21606.47970314,
       24199.1399363 , 26139.96558783, 24387.51454553, 31026.55073683,
       24099.10889515, 21997.90701494, 32059.48296332, 26774.13145585,
       16374.1343365 , 19084.49432443, 25741.19922936, 27689.36392757,
       16627.20255966, 21752.17783846, 31585.69332398, 31671.04627179,
       27197.90557461, 26977.18415842, 30926.51969568, 26200.31077535,
       25420.44677201, 30908.85098207, 26343.01829041, 20277.80277959,
       24259.48512382, 26104.6281606 , 24469.87687306, 32262.5356659 ,
       23386.929126  , 21759.51688513, 28298.52203529, 22902.80981972,
       19202.1940792 , 18685.72796596, 18956.46490272, 24537.56110726,
       18660.72020567, 25004.01169992, 30025.96531731, 22760.10230466,
       31899.10673465, 32465.58836847, 17991.21691043, 20377.83382075,
       23148.5389962 , 25011.3507466 , 29815.57356806, 20811.93760645,
       26944.83735146, 27935.09310404, 29035.70956472, 23472.28207381,
       20032.07360312, 22952.82534029, 23266.23875096, 18643.05149206,
       30171.66345263, 21360.75052666, 26232.65758231, 22624.73383628,
       27782.05592205, 30916.19002874, 20185.11078511, 30474.74719636,
       31204.59567912, 23785.69548448, 21649.15617704, 29712.55190664,
       16737.56326775, 27536.32674557, 24277.15383743, 25039.34912715,
       26325.34957679, 23650.32701609, 27586.34226614, 18211.93832662,
       21235.71172522, 25648.50723488, 16865.59268946, 22115.60676971,
       30053.96369787, 27105.21358013, 27450.97379776, 20786.92984616,
       25029.01946021, 18565.03759092, 25096.7036944 , 30951.52745597,
       26189.98110841, 22820.44749218, 25495.47005288, 25666.17594849,
       25164.3879286 , 22318.65947228, 23935.74204621, 24462.53782639,
       26485.72580546, 28586.92768567, 26435.71028489, 26741.78464888,
       22293.65171199, 21253.38043883, 30688.12956588, 31111.90368464,
       20310.14958656, 29740.5502872 , 20548.53971636, 23191.2154701 ,
       24665.59052897, 30713.13732617, 27774.71687537, 23903.39523924,
       29537.49758462, 23379.59007933, 18565.03759092, 22895.47077304,
       24006.41690067, 29128.4015592 , 20007.06584283, 30823.49803426,
       26962.50606507, 26036.94392641, 29893.58746919, 23793.03453115,
       21278.38819912, 29096.05475224, 29053.37827834, 18746.07315348,
       20989.98254874, 20202.77949873, 31467.99356921, 23447.27431352,
       19999.72679615, 29936.2639431 , 21709.50136456, 28426.551457  ,
       31179.58791883, 23354.58231904, 21007.65126235, 21609.4703234 ,
       25360.1015845 , 21922.88373407, 29477.1523971 , 22404.01242009,
       29900.92651587, 21328.4037197 , 23422.26655323, 20701.57689836,
       15679.62328097, 24886.31194516, 20220.44821234, 22785.11006495,
       19127.17079834, 22200.95971751, 16797.90845527, 16915.60821004,
       28953.34723719, 25530.80748011, 18906.44938215, 27988.09924489,
       17111.32186594, 17432.07432328, 28419.21241033, 22243.63619142,
       18710.73572625, 24302.16159772, 26656.43170108, 21905.21502046,
       19871.69737444, 21160.68844435, 17424.73527661, 30342.36934825,
       25082.02560106, 30435.06134273, 17271.69809461, 21787.51526569,
       16940.61597032, 15950.36021774, 27988.09924489, 29858.25004196,
       19515.60748987, 26859.48440365, 17524.76631776, 23208.88418371,
       31044.21945045, 27674.68583422, 20676.56913807, 15586.93128649,
       16737.56326775, 17948.54043652, 29968.61075006, 28419.21241033,
       18924.11809576, 27521.64865222, 17763.15644757, 31696.05403208,
       18575.36725786, 25826.55217716, 23133.86090285, 24979.00393964,
       18931.45714244, 26343.01829041, 21075.33549655, 28486.89664452,
       25986.92840584, 27699.69359451, 24116.77760876, 23632.65830248,
       16085.72868612, 26478.38675879, 20473.5164355 , 24074.10113486,
       18771.08091376, 28266.17522833, 23404.59783962, 30164.32440596,
       27674.68583422, 28198.49099414, 28486.89664452, 25427.78581869,
       23803.36419809, 19102.16303805, 20042.40327006, 29138.73122614,
       18838.76514796, 18966.79456967, 30299.69287434, 20897.29055426,
       26428.37123821, 21278.38819912, 30011.28722396, 24590.5672481 ,
       25630.83852126, 19134.50984501, 26791.80016946, 32330.21990009,
       28163.15356691, 23216.22323039, 28451.55921729, 26884.49216394,
       27646.68745366, 31721.06179237, 18921.12747549, 22022.91477523,
       18158.93218577, 31678.38531846, 23547.30535467, 24733.27476316,
       27340.61308967, 30103.97921844, 25502.80909955, 18269.29289387,
       17642.46607253, 28010.11638491, 27960.10086433, 22835.12558553,
       18515.02207035, 22785.11006495, 28993.03309082, 24893.65099183,
       28391.21402977, 21513.78770866, 20370.49477407, 20979.6528818 ,
       26884.49216394, 26350.35733708, 23436.94464658, 26824.14697642,
       31763.73826627, 19633.30724464, 19227.20183949, 32041.81424971,
       28519.24345148, 29612.52086549, 31924.11449494, 31001.54297655,
       30239.34768682, 26122.29687422, 24106.44794182, 30018.62627064,
       16694.88679385, 20099.75783731, 32483.25708209, 30246.6867335 ,
       30560.10014417, 31179.58791883, 31001.54297655, 23732.68934363,
       18116.25571187, 20074.75007702, 25705.86180213, 19166.85665197,
       22877.80205943, 24003.4262804 , 22165.62229028, 28391.21402977,
       24427.20039916, 31660.71660485, 19854.02866083, 29238.7622673 ,
       32889.36248724, 29231.42322062, 29858.25004196, 25545.48557346,
       28451.55921729, 17075.98443871, 31671.04627179, 27646.68745366,
       21912.55406713, 24868.64323154, 21741.84817152, 24377.18487858,
       17033.3079648 , 26546.07099298, 26994.85287204, 25951.59097861,
       20598.55523694, 24800.95899735, 21563.80322923, 20769.26113255,
       25299.75639698, 21310.73500608, 24918.65875212, 21446.10347447,
       27069.8761529 , 21072.34487628, 25791.21474993, 29569.84439158,
       20463.18676855, 29441.81496987, 27037.52934594, 18422.33007587,
       20438.17900827, 25723.53051574, 26749.12369556, 26485.72580546,
       20352.82606046, 32041.81424971, 20124.7655976 , 25224.73311611,
       31211.93472579, 31051.55849712, 20758.93146561, 29299.10745482,
       20124.7655976 , 27400.95827718, 30204.01025959, 30086.31050483,
       25680.85404184, 17973.54819681, 24352.1771183 , 29680.20509968,
       19793.68347331, 21759.51688513, 25833.89122384, 24758.28252345,
       22724.76487743, 26510.73356575, 20641.23171084, 22877.80205943,
       19455.26230235, 24918.65875212, 22955.81596056, 25149.70983525,
       25460.13262565, 26360.68700402, 19515.60748987, 22022.91477523,
       22710.08678408, 23429.6055999 , 17905.86396262, 24572.89853449,
       23554.64440134, 26104.6281606 , 24234.47736353, 26563.7397066 ,
       16171.08163393, 30933.85874236, 27308.2662827 , 18422.33007587,
       30189.33216625, 19982.05808254, 25417.45615175, 28597.25735261,
       18041.232431  , 21471.11123475, 24665.59052897, 23607.65054219,
       30866.17450816, 21922.88373407, 32490.59612876, 31400.30933502,
       23693.00349   , 19558.28396378, 23098.52347562, 27055.19805955,
       23921.06395286, 17702.81126005, 25370.43125144, 31874.09897436,
       18297.29127442, 26410.7025246 , 22674.74935685, 24875.98227821,
       21716.84041123, 29274.09969453, 26086.95944699, 28865.00366911,
       19776.0147597 , 31931.45354161, 33498.52059496, 31899.10673465,
       27707.03264118, 20761.92208588, 32280.20437951, 28732.625821  ,
       20132.10464427, 26360.68700402, 25655.84628155, 26546.07099298,
       25470.46229259, 25580.82300069, 27528.98769889, 20736.91432559,
       18946.13523578, 25082.02560106, 33566.20482915, 26019.2752128 ,
       18838.76514796, 18828.43548101, 29619.85991216, 28597.25735261,
       30805.82932065, 25463.12324592, 20751.59241893, 23311.90584514,
       25741.19922936, 28348.53755587, 19194.85503253, 24633.243722  ,
       30364.38648827, 22275.98299838, 24038.76370763, 24316.83969107,
       28732.625821  , 30339.37872798, 31703.39307875, 19932.04256196,
       27097.87453346, 27739.37944814, 28223.49875442, 26741.78464888,
       26969.84511175, 22311.32042561, 16491.83409127, 30229.01801988,
       24384.52392526, 24878.97289848, 30738.14508645, 24074.10113486,
       17246.69033432, 28444.22017061, 24494.88463335, 32305.2121398 ,
       28483.90602425, 29552.17567797, 20057.0813634 , 28348.53755587,
       31332.62510083, 28273.514275  , 19134.50984501, 19565.62301045,
       20124.7655976 , 16872.93173613, 28469.2279309 , 22072.9302958 ,
       17143.6686729 , 18981.47266301, 19184.52536559, 22785.11006495,
       26044.28297309, 17998.5559571 , 32152.1749578 , 18041.232431  ,
       32169.84367142, 21335.74276637, 21082.67454322, 27799.72463566,
       30028.95593758, 23896.05619257, 24387.51454553, 19786.34442664,
       23785.69548448, 19141.84889168, 15722.29975488, 25274.74863669,
       18319.30841444, 18888.78066853, 30898.52131513, 28597.25735261,
       19846.68961416, 28258.83618165, 27461.3034647 , 23386.929126  ,
       22820.44749218, 25598.4917143 , 27340.61308967, 32711.31754495,
       25164.3879286 , 24622.91405506, 24241.8164102 , 23803.36419809,
       20491.18514911, 20270.46373292, 19793.68347331, 31543.01685008,
       29552.17567797, 28910.67076328, 19109.50208472, 20327.81830017,
       20929.63736122, 27272.92885547, 21456.43314141, 29908.26556254,
       16306.45010231, 21894.88535352, 21937.56182742, 16762.57102804,
       21699.17169762, 23063.18604839, 29730.22062025, 23693.00349   ,
       20583.87714359, 18846.10419463, 16797.90845527, 21253.38043883,
       25588.16204736, 20413.17124798, 19312.5547873 , 30755.81380007,
       20380.82444102, 27518.65803195, 20310.14958656, 23376.59945906,
       32244.86695228, 20854.61408036, 25716.19146907, 26097.28911393,
       23066.17666866, 32779.00177914, 33117.4229501 , 26047.27359335,
       23116.19218923, 31831.42250046, 17236.36066738, 29808.23452139,
       23757.69710392, 22707.09616382, 27080.20581984, 20327.81830017,
       32034.47520303, 21919.89311381, 27892.41663014, 18728.40443986,
       24199.1399363 , 20107.09688398, 19327.23288064, 18582.70630454,
       19330.22350091, 26724.11593527, 23693.00349   , 27589.33288641,
       27112.5526268 , 27995.43829156, 26588.74746689, 29113.72346586,
       15747.30751517, 22820.44749218, 22090.59900942, 20335.15734684,
       18760.75124682, 16780.23974165, 26097.28911393, 16559.51832546,
       29630.1895791 , 22311.32042561, 27503.97993861, 20142.43431121,
       17702.81126005, 28223.49875442, 31357.63286112, 19490.59972958,
       29630.1895791 , 24911.31970544, 27732.04040147, 18888.78066853,
       21328.4037197 , 31197.25663245, 19490.59972958, 30011.28722396,
       18347.306795  , 16288.7813887 , 22870.46301276, 19422.91549539,
       23878.38747896, 21684.49360427, 26902.16087756, 25623.49947459,
       26496.05547241, 17923.53267624, 27258.25076213, 20363.1557274 ,
       24302.16159772, 26563.7397066 , 22175.95195723, 17813.17196814,
       25538.14652678, 17813.17196814, 26809.46888308, 20676.56913807,
       24750.94347677, 24259.48512382, 29299.10745482, 23743.01901057,
       30011.28722396, 25021.68041354, 25445.4545323 , 19312.5547873 ,
       16762.57102804, 28088.13028604, 22692.41807047, 22446.68889399,
       26699.10817498, 16872.93173613, 21641.81713037, 28850.32557576,
       21777.18559875, 23276.56841791, 28020.44605185, 20117.42655092,
       30282.02416073, 18888.78066853, 16805.24750194, 26902.16087756,
       20252.79501931, 23319.24489181, 17296.7058549 , 27910.08534376,
       30257.01640044, 31197.25663245, 24597.90629477, 25580.82300069,
       19668.64467187, 27732.04040147, 21125.35101712, 17567.44279166,
       30078.97145815, 23429.6055999 , 18101.57761852, 20701.57689836,
       16534.51056518, 17813.17196814, 23971.07947344, 29833.24228168,
       24284.49288411, 27639.34840699, 27707.03264118, 22760.10230466,
       16669.87903356, 25691.18370878, 20227.78725902, 24394.8535922 ,
       23048.50795504, 21616.80937008, 18618.04373177, 27122.88229375,
       24462.53782639, 28223.49875442, 25089.36464773, 20007.06584283,
       29519.82887101, 20252.79501931, 26877.15311727, 18550.35949758,
       24302.16159772, 29943.60298977, 30730.80603978, 17093.65315232,
       24996.67265325, 19583.29172406, 16424.14985708, 31806.41474017,
       18194.269613  , 29003.36275776, 20811.93760645, 25936.91288526,
       19177.18631891, 21549.12513589, 31535.67780341, 23743.01901057,
       16627.20255966, 24615.57500839, 25538.14652678, 18660.72020567,
       20836.94536674, 19287.54702701, 25242.40182973, 24911.31970544,
       17635.12702586, 24683.25924258, 23319.24489181, 18372.31455529,
       23589.98182857, 16221.09715451, 30841.16674788, 28401.54369671,
       25716.19146907, 17702.81126005])
In [139]:
rmse(targets, predictions)
Out[139]:
4369.5796297419165

Model Imporovements¶

Improvements to the model can be made by adjusting the weight of numerical features in the dataset. This can be achieved by a process known as feature scaling through standardization. It involves rescaling each feature such that it has a standard deviation of 1 and mean of 0 for numeric features in the dataset.

Analysis:

With the addition of feature scaling, the RMSE improved to 3731.82. This brings the results closer to 10 - 15 percent of the data range as initially noted. The values of the standardized features are also displayed.

In [147]:
numeric_cols = ['Age', 'Height', 'Weight', 'NumberOfMajorSurgeries'] 
scaler = StandardScaler()
scaler.fit(medical_df[numeric_cols])
Out[147]:
StandardScaler()
In [148]:
scaled_inputs = scaler.transform(medical_df[numeric_cols])
scaled_inputs
Out[148]:
array([[ 0.23319694, -1.30610453, -1.39924954, -0.89118667],
       [ 1.30798124,  1.17085167, -0.27706151, -0.89118667],
       [-0.41167363, -1.00886978, -1.25897603,  0.44423895],
       ...,
       [ 1.02137209, -1.30610453, -0.41733501,  0.44423895],
       [ 0.37650152, -1.00886978, -0.27706151,  0.44423895],
       [-1.48645793, -1.00886978, -0.13678801,  0.44423895]])
In [150]:
cat_cols = ['Diabetes', 'BloodPressureProblems', 'AnyTransplants', 'AnyChronicDiseases', 'KnownAllergies', 'HistoryOfCancerInFamily']
categorical_data = medical_df[cat_cols].values
In [152]:
inputs = np.concatenate((scaled_inputs, categorical_data), axis=1)
targets = medical_df.PremiumPrice

# Create and train the model
model = LinearRegression().fit(inputs, targets)

# Generate predictions
predictions = model.predict(inputs)

# Compute loss to evalute the model
loss = rmse(targets, predictions)
print('Loss:', loss)
Loss: 3731.8234288333797
In [153]:
weights_df = pd.DataFrame({
    'feature': np.append(numeric_cols + cat_cols, 1),
    'weight': np.append(model.coef_, model.intercept_)
})
weights_df.sort_values('weight', ascending=False)
Out[153]:
feature weight
10 1 23176.017071
6 AnyTransplants 7894.201264
0 Age 4596.742766
7 AnyChronicDiseases 2654.886425
9 HistoryOfCancerInFamily 2311.829368
2 Weight 993.421080
8 KnownAllergies 300.882400
5 BloodPressureProblems 180.503577
1 Height -58.760410
4 Diabetes -429.119839
3 NumberOfMajorSurgeries -489.870967

Creating a Test Set¶

Models like the one we created are designed for real-world applications. It's common practice to set aside a small portion of the data, typically around 10%, for testing and evaluating the model's performance.

In [156]:
inputs_train, inputs_test, targets_train, targets_test = train_test_split(inputs, targets, test_size=0.1)
In [157]:
# Create and train the model
model = LinearRegression().fit(inputs_train, targets_train)

# Generate predictions
predictions_test = model.predict(inputs_test)

# Compute loss to evalute the model
loss = rmse(targets_test, predictions_test)
print('Test Loss:', loss)
Test Loss: 4435.015156954542
In [158]:
# Generate predictions
predictions_train = model.predict(inputs_train)

# Compute loss to evalute the model
loss = rmse(targets_train, predictions_train)
print('Training Loss:', loss)
Training Loss: 3646.0431646982643